[slimtensor] integration into backend by Gasoonjia · Pull Request #16565 · pytorch/executorch

Gasoonjia · 2026-01-13T19:07:36Z

This diff makes cuda backend actually use slimtensor.
It:
updates cuda_backends to create slimtensor from given etensor
removed duplicate etensor-driven shim layers under cuda_backend
update cmake logic in both cuda backend and aoti backend
Perf maintains the same. Shows as before.

Worth to notice that we are still keeping two sets of common shims, one is etensor-based and for metal backend, the other is slimtensor-based which used by cuda backend, to not impact metal backend work.
When Metal backend finishs the migration, we should delete the duplicate common shims and only keep slimtensor-based one.

Stack from ghstack (oldest at bottom):

-> [slimtensor] integration into backend #16565

Differential Revision: D90606409

Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) [ghstack-poisoned]

pytorch-bot · 2026-01-13T19:07:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16565

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 3 Unrelated Failures

As of commit f8a812e with merge base 1df4dac ():

NEW FAILURES - The following jobs have failed:

cuda-perf / benchmark-cuda (mistralai/Voxtral-Mini-3B-2507, quantized-int4-tile-packed, mistralai_Voxtral-Min... / linux-job (gh)
RuntimeError: Command docker exec -t d912314c8c4f03f28e0a4d00389ad8519faab2c3e4617860b92164e4418394d6 /exec failed with exit code 1
pull / unittest-buck / linux / linux-job (gh)
RuntimeError: Command docker exec -t 45c766d1a38eabb8a5c02bc45e542d7c56a8c8f46de93827b5b3d149747b5ab7 /exec failed with exit code 3

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest-buck / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 3
Test Metal Backend / test-model-metal-e2e (openai, whisper-large-v3-turbo, non-quantized) / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
Test Metal Backend / test-model-metal-e2e (openai, whisper-small, non-quantized) / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) ghstack-source-id: 333239044 Pull Request resolved: #16565

Pull Request resolved: #16565 perf maintains as before. {F1984962152} ghstack-source-id: 336200461 @exported-using-ghexport Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/)

Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) [ghstack-poisoned]

Pull Request resolved: #16565 perf maintains as before. {F1984962152} ghstack-source-id: 336233120 @exported-using-ghexport Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/)

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #16565 * #16551 * #16469 * #16457 * #16455 * #16454 * #16453 * #16452 * #16451 * #16450 * #16449 * #16448 * #16447 * #16446 * __->__ #16724 Copy CUDAGuard and CUDAStreamGuard from cuda/runtime/ to aoti/slim/cuda/ to support slimtensor requirement while get rid of potential circular dependency: - cuda_backend/main_functionalities -> aoti/slimtensor -> cuda_backend/cuda_guard This change: - copy guard.h, guard.cpp and test files from backend/cuda_backend to backend/aoti/slim/cuda/ Differential Revision: [D91056808](https://our.internmc.facebook.com/intern/diff/D91056808/)

…v2 (#16446) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #16565 * #16551 * #16469 * #16457 * #16455 * #16454 * #16453 * #16452 * #16451 * #16450 * #16449 * #16448 * #16447 * __->__ #16446 * #16724 Add SlimTensor-based implementations of AOTI shim functions for tensor creation: 1. `aoti_torch_create_tensor_from_blob_v2()` - Creates a non-owning SlimTensor that wraps existing memory using the `from_blob()` factory Both functions support CPU and CUDA devices and handle all 7 SlimTensor dtypes. Also add `memory_slim.h` and `memory_slim.cpp` with SlimTensor-based shim implementations for working on new API while not impact the current pipeline. Will use memory_slim.{h/cpp} to replace current memory.{h/cpp} when everything has been set up. Differential Revision: [D90126247](https://our.internmc.facebook.com/intern/diff/D90126247/)

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #16565 * #16551 * #16469 * #16457 * #16455 * #16454 * #16453 * #16452 * #16451 * #16450 * #16449 * #16448 * __->__ #16447 * #16446 * #16724 Add SlimTensor-based implementations of AOTI shim functions for tensor creation: `aoti_torch_create_tensor_from_blob_v2()` - Creates a non-owning SlimTensor that wraps existing memory using the `from_blob()` factory Both functions support CPU and CUDA devices and handle all 7 SlimTensor dtypes. Changes: - Add `memory_slim.h` and `memory_slim.cpp` with SlimTensor-based shim implementations - Add `runtime_shims_slim` library target to TARGETS with `CUDA_AVAILABLE=1` preprocessor flag - Add `cuda_shim_slim_cpp_unittest()` function for SlimTensor test targets Differential Revision: [D90126244](https://our.internmc.facebook.com/intern/diff/D90126244/)

larryliu0820

Review automatically exported from Phabricator review in Meta.

This diff makes cuda backend actually use slimtensor. It: updates cuda_backends to create slimtensor from given etensor removed duplicate etensor-driven shim layers under cuda_backend update cmake logic in both cuda backend and aoti backend Perf maintains the same. Shows as before. <img width="3092" height="1902" alt="image" src="https://github.com/user-attachments/assets/6061576b-0d4b-4b20-ac8d-5f45493737d8" /> Worth to notice that we are still keeping two sets of common shims, one is etensor-based and for metal backend, the other is slimtensor-based which used by cuda backend, to not impact metal backend work. When Metal backend finishs the migration, we should delete the duplicate common shims and only keep slimtensor-based one. Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) [ghstack-poisoned]

Pull Request resolved: #16565 This diff makes cuda backend actually use slimtensor. It: 1. updates cuda_backends to create slimtensor from given etensor 2. removed duplicate etensor-driven shim layers under cuda_backend 3. update cmake logic in both cuda backend and aoti backend Perf maintains the same. Shows as before. {F1984982156} Worth to notice that currently we keeps two sets of common shims, one is etensor-based and for metal backend, the other is slimtensor-based which used by cuda backend, to not impact metal backend work. When Metal backend finishs the migration, we should delete the duplicate common shims and only keep slimtensor-based one. ghstack-source-id: 336538676 @exported-using-ghexport Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/)

This diff makes cuda backend actually use slimtensor. It: updates cuda_backends to create slimtensor from given etensor removed duplicate etensor-driven shim layers under cuda_backend update cmake logic in both cuda backend and aoti backend Perf maintains the same. Shows as before. <img width="3092" height="1902" alt="image" src="https://github.com/user-attachments/assets/6061576b-0d4b-4b20-ac8d-5f45493737d8" /> Worth to notice that we are still keeping two sets of common shims, one is etensor-based and for metal backend, the other is slimtensor-based which used by cuda backend, to not impact metal backend work. When Metal backend finishs the migration, we should delete the duplicate common shims and only keep slimtensor-based one. Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) [ghstack-poisoned]

Pull Request resolved: #16565 This diff makes cuda backend actually use slimtensor. It: 1. updates cuda_backends to create slimtensor from given etensor 2. removed duplicate etensor-driven shim layers under cuda_backend 3. update cmake logic in both cuda backend and aoti backend Perf maintains the same. Shows as before. {F1984982156} Worth to notice that currently we keeps two sets of common shims, one is etensor-based and for metal backend, the other is slimtensor-based which used by cuda backend, to not impact metal backend work. When Metal backend finishs the migration, we should delete the duplicate common shims and only keep slimtensor-based one. ghstack-source-id: 336658381 @exported-using-ghexport Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/)

This diff makes cuda backend actually use slimtensor. It: updates cuda_backends to create slimtensor from given etensor removed duplicate etensor-driven shim layers under cuda_backend update cmake logic in both cuda backend and aoti backend Perf maintains the same. Shows as before. <img width="3092" height="1902" alt="image" src="https://github.com/user-attachments/assets/6061576b-0d4b-4b20-ac8d-5f45493737d8" /> Worth to notice that we are still keeping two sets of common shims, one is etensor-based and for metal backend, the other is slimtensor-based which used by cuda backend, to not impact metal backend work. When Metal backend finishs the migration, we should delete the duplicate common shims and only keep slimtensor-based one. Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) [ghstack-poisoned]

Pull Request resolved: #16565 This diff makes cuda backend actually use slimtensor. It: 1. updates cuda_backends to create slimtensor from given etensor 2. removed duplicate etensor-driven shim layers under cuda_backend 3. update cmake logic in both cuda backend and aoti backend Perf maintains the same. Shows as before. {F1984982156} Worth to notice that currently we keeps two sets of common shims, one is etensor-based and for metal backend, the other is slimtensor-based which used by cuda backend, to not impact metal backend work. When Metal backend finishs the migration, we should delete the duplicate common shims and only keep slimtensor-based one. ghstack-source-id: 336675369 @exported-using-ghexport Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/)

This diff makes cuda backend actually use slimtensor. It: updates cuda_backends to create slimtensor from given etensor removed duplicate etensor-driven shim layers under cuda_backend update cmake logic in both cuda backend and aoti backend Perf maintains the same. Shows as before. <img width="3092" height="1902" alt="image" src="https://github.com/user-attachments/assets/6061576b-0d4b-4b20-ac8d-5f45493737d8" /> Worth to notice that we are still keeping two sets of common shims, one is etensor-based and for metal backend, the other is slimtensor-based which used by cuda backend, to not impact metal backend work. When Metal backend finishs the migration, we should delete the duplicate common shims and only keep slimtensor-based one. Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) [ghstack-poisoned]

Pull Request resolved: #16565 This diff makes cuda backend actually use slimtensor. It: 1. updates cuda_backends to create slimtensor from given etensor 2. removed duplicate etensor-driven shim layers under cuda_backend 3. update cmake logic in both cuda backend and aoti backend Perf maintains the same. Shows as before. {F1984982156} Worth to notice that currently we keeps two sets of common shims, one is etensor-based and for metal backend, the other is slimtensor-based which used by cuda backend, to not impact metal backend work. When Metal backend finishs the migration, we should delete the duplicate common shims and only keep slimtensor-based one. ghstack-source-id: 336849886 @exported-using-ghexport Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/)

This diff makes cuda backend actually use slimtensor. It: updates cuda_backends to create slimtensor from given etensor removed duplicate etensor-driven shim layers under cuda_backend update cmake logic in both cuda backend and aoti backend Perf maintains the same. Shows as before. <img width="3092" height="1902" alt="image" src="https://github.com/user-attachments/assets/6061576b-0d4b-4b20-ac8d-5f45493737d8" /> Worth to notice that we are still keeping two sets of common shims, one is etensor-based and for metal backend, the other is slimtensor-based which used by cuda backend, to not impact metal backend work. When Metal backend finishs the migration, we should delete the duplicate common shims and only keep slimtensor-based one. Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) [ghstack-poisoned]

Pull Request resolved: #16565 This diff makes cuda backend actually use slimtensor. It: 1. updates cuda_backends to create slimtensor from given etensor 2. removed duplicate etensor-driven shim layers under cuda_backend 3. update cmake logic in both cuda backend and aoti backend Perf maintains the same. Shows as before. {F1984982156} Worth to notice that currently we keeps two sets of common shims, one is etensor-based and for metal backend, the other is slimtensor-based which used by cuda backend, to not impact metal backend work. When Metal backend finishs the migration, we should delete the duplicate common shims and only keep slimtensor-based one. ghstack-source-id: 336891991 @exported-using-ghexport Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/)

This diff makes cuda backend actually use slimtensor. It: updates cuda_backends to create slimtensor from given etensor removed duplicate etensor-driven shim layers under cuda_backend update cmake logic in both cuda backend and aoti backend Perf maintains the same. Shows as before. <img width="3092" height="1902" alt="image" src="https://github.com/user-attachments/assets/6061576b-0d4b-4b20-ac8d-5f45493737d8" /> Worth to notice that we are still keeping two sets of common shims, one is etensor-based and for metal backend, the other is slimtensor-based which used by cuda backend, to not impact metal backend work. When Metal backend finishs the migration, we should delete the duplicate common shims and only keep slimtensor-based one. Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) [ghstack-poisoned]

Pull Request resolved: #16565 This diff makes cuda backend actually use slimtensor. It: 1. updates cuda_backends to create slimtensor from given etensor 2. removed duplicate etensor-driven shim layers under cuda_backend 3. update cmake logic in both cuda backend and aoti backend Perf maintains the same. Shows as before. {F1984982156} Worth to notice that currently we keeps two sets of common shims, one is etensor-based and for metal backend, the other is slimtensor-based which used by cuda backend, to not impact metal backend work. When Metal backend finishs the migration, we should delete the duplicate common shims and only keep slimtensor-based one. ghstack-source-id: 336976428 @exported-using-ghexport Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/)

@Gasoonjia

This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: #16565 by @Gasoonjia ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/gasoonjia/101/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/gasoonjia/101/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/gasoonjia/101/orig Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) @diff-train-skip-merge Co-authored-by: gasoonjia <gasoonjia@icloud.com>

@Gasoonjia

This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: #16565 by @Gasoonjia ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/gasoonjia/101/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/gasoonjia/101/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/gasoonjia/101/orig Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) @diff-train-skip-merge Co-authored-by: gasoonjia <gasoonjia@icloud.com> Co-authored-by: Cursor <cursoragent@cursor.com>

[slimtensor] integration into backend

0e0d0a0

Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) [ghstack-poisoned]

Gasoonjia requested review from kirklandsign and larryliu0820 as code owners January 13, 2026 19:07

This was referenced Jan 13, 2026

[slimtensor] Add factory functions for creating empty CPU tensors #16398

Merged

[slimtensor] Add all required dtype support (Int8/16/32/64, Bool, BFloat16) #16399

Merged

Gasoonjia mentioned this pull request Jan 13, 2026

[slimtensor] Add CUDA DeviceType and extend Device class #16437

Merged

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 13, 2026

Gasoonjia added a commit that referenced this pull request Jan 13, 2026

[slimtensor] integration into backend

c7ebdda

Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) ghstack-source-id: 333239044 Pull Request resolved: #16565

Update on "[slimtensor] integration into backend"

ff05337

Differential Revision: [D90606409](https://our.internmc.facebook.com/intern/diff/D90606409/) [ghstack-poisoned]

larryliu0820 approved these changes Jan 28, 2026

View reviewed changes

Gasoonjia temporarily deployed to upload-benchmark-results January 29, 2026 09:48 — with GitHub Actions Inactive

Gasoonjia mentioned this pull request Jan 29, 2026

use slimtensor in cuda backend as internal tensor representation #16280

Closed

Gasoonjia temporarily deployed to upload-benchmark-results January 30, 2026 06:02 — with GitHub Actions Inactive

Gasoonjia temporarily deployed to upload-benchmark-results January 30, 2026 07:56 — with GitHub Actions Inactive

meta-codesync bot merged commit 7ae87a5 into gh/gasoonjia/101/base Jan 30, 2026
202 of 210 checks passed

meta-codesync bot deleted the gh/gasoonjia/101/head branch January 30, 2026 18:23

meta-codesync bot temporarily deployed to cherry-pick-bot January 30, 2026 18:23 Inactive

pytorchbot mentioned this pull request Jan 30, 2026

[slimtensor] integration into backend #17070

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[slimtensor] integration into backend#16565

[slimtensor] integration into backend#16565
meta-codesync[bot] merged 25 commits intogh/gasoonjia/101/basefrom
gh/gasoonjia/101/head

Gasoonjia commented Jan 13, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jan 13, 2026 •

edited

Loading

Uh oh!

larryliu0820 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Gasoonjia commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16565

❌ 2 New Failures, 3 Unrelated Failures

Uh oh!

larryliu0820 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Gasoonjia commented Jan 13, 2026 •

edited

Loading

pytorch-bot bot commented Jan 13, 2026 •

edited

Loading